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IffiVELOPMENT  Al'©  C(M>ARISON  OF  M-ESTIMATC«S  FCR  LOCATION 
ON  THE  BASIS  OF  THE  ASYMPTOTIC  VARIANCE  FUNCTIONAL 


A new  approach  for  cOT5>arison  and  modification  of  M-estimators  is  intro- 
duced  and  implonented.  The  problan  considered  is  that  of  robust  estimation 
of  a location  parameter.  Specific  attention  is  given  to  the  e- contaminated 
normal  model.  The  analytical  method  introduced  is  based  upon  the  asynptotic 
variance  of  the  estimator,  the  asymptotic  variance  being  considered  as  a 
functional  over  the  space  of  distribution  functions.  The  behavior  of  this 
functional  is  investigated  with  respect  to  special  sub-families  within  a 
neighborhood  of  the  "central"  distribution.  With  respect  to  the  normal  loca- 
tion problem,  three  robust  estimators  from  the  1972  Princeton  Mcxite  Carlo 
study  are  examined:  the  "Huber"  H15,  the  "Hansel"  25A,  and  the  sine 
function  M-estimator  AMT.  Using  the  asymptotic  variance  function-1  analysis 

as  both  an  analytical  and  intuitive  tool,  three  modified  estimators  are  sug- 
gested and  developed.  All  six  estimators  are  then  compared  at  selected 

distributions  heavier  tailed  than  the  nornal.  Besides  its  analytical  and 
intuitive  appeal,  this  functional  approach  offers  a cost-saving  alternative 
to  Monte  Carlo  methods. 


¥ 
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1.  INTRODUCTION 


The  notion  of  M-estiination  was  introduced  by  Huber  [8]  in  order  to  develop 
a robust  estunator  for  the  mean  of  a normal  distribution.  The  approach  has 
since  played  a central  role  in  robustness  theory  for  location  estimation  and 
has  been  directed  toward  other  estimation  problems  as  well.  For  a theoretical 
overview  see  Huber  C9]  and  for  a broad  Monte  Carlo  study  see  Andrews  et  dl. 

CIJ.  Sane  features  of  M-estimation  apropos  to  the  motives  of  the  present 
investigation  are: 

functional  characterizations  of  parameters  are  natural; 
confutations  via  iterative  method  are  relatively  easy; 
fine  tuning  of  the  estimators  is  straightforward;  the 
M-estimators  form  an  "admissible"  class  in  a sense  made 
precise  by  Hampel  C7,  p.  391]. 

Ihe  present  pjfer  introduces  a new  approach  for  comparison  of  M*estimators. 

It  is  in  character  an  analytical  tool,  based  on  asynftotic  variance  coisid- 
ered  as  a functional,  but  also  the  ^)proach  facilitates  the  use  of  intuition 
and  provides  a cost-saving  alternative  to  Monte  Carlo  methods.  We  utilize 
the  approach  to  illuminate  some  previous  M-estimators  and  to  develop  new  ores. 
Attention  is  confined  to  the  location  problm  with  scale  known. 

t 

Let  us  define  what  is  meant  by  an  M-estimator  for  a location  paramete; 
in  the  case  of  scale  known.  Consider  i.i.d.  observations  from  a 

distribution  Fg(x)  * F(x  - 8),  for  some  distribution  F.  In  this  case  6 is  a 
real-valued  location  parameter.  Let  be  any  function  such  that  € is  the 
solution  T of  the  equation 


U(x  - T)dFg(x)  - 0 . 


(1.1) 
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Then  a natural  estimator  of  e is  given  by  the  solution  8 of  the  equation 

(1.2)  - e)dF^(x)  = 0 , 

vAiere  F denotes \tbe  usual  san^le  distribution 'function.  We  call  this  the 
n 

M-eetimator  of  8 corresponding  to  and  sometimes  designate  the  estimator 
by  referring  to  the  function  ij»  rather  than  the  solution  8.  (For  some  choices 
of  , the  equations  (1.1)  and  (1.2)  may  have  multiple  solutions.  We  assume 
iiqplicitly  that  in  such  a case  there  is  a procedure  for  selecting  a single 
one  of  the  solutions  as  appropriate  in  the  role  of  6 or  8 .)  Note  that  (1.1) 
iii9}lies  the  relation 

(1.3)  /t|»(x)dF(x)  =0  ( » /iKx  - 9)dF(x  - 8)) 

I 

between  ^ and  F.  Note  also  that  8 takes  the  value  0 at  the  distribution  F. 

If  F has  mean  0,  in  vdiichcase  the  location  parameter  8 maybe  represented 
as  the  mean  of  F.,  then  (1.1)  is  fulfilled  for  \|)(x)  • x and  the  corresponding 
M-estimator  given  by  (1.2)  is  the  sample  mean.  Note  that  in  this  case  the 
equation  (1.2)  for  e Is  equivalent  to  the  equation  for  obtaining  6 by  the 
method  of  least  squares.  If  F is  synmetric  with  unique  median  0,  then  the 
M-estimator  corresponding  to  <»(x)  ■ -1,  0,  or  1 according  as  x < 0,.*  0,  or 
> 0,  is  the  sample  median  and  is  equivalent  to  estimation  by  the  method  of 
least  absolute  values.  IfF*aG+(l-o)H,  where  0soSl,Gis  absolutely 
continuous  with  density  g,  and  H is  discrete  with  mass  function  h,  and  if 
f ■ og  ♦ (1  - o)h  is  differentiable,  then  for  * f'/f  the  soluticHi  of  (1.2) 
is  the  maximun  likelihood  estimator  of  e.  Note  that  this  choice  of  depends 
upon  the  partiadar  distribution  F generating  the  locatim  model. 
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liider  suitable  regularity  conditions  on  and  F,  (e  - 6)  is  syn?>- 
totically  normal  vdth  mean  0 and  variance  (independent  of  6 since  0 is  a 
location  puameter)  given  by 

■ M J i - ■ ■ ■■ 

(1.4)  a(F)  = . 

(/»-(x)dF(x)]‘^ 

(See,  for  example,  Rao  [10,  p.378]  or  Huber  [8].)  Clearly,  A(F)  may  be 
viewed  as  a functional  defined  on  a class  of  possible  distribution  functions 
F.  This  characterization  of  the  asymptotic  variance  is  helpful  in  two  ways: 
it  facilitates  looking  at  A(F)  for  small  departures  from  F,  and  it  leads  into 
the  use  of  existing  theory  of  functionals  in  the  analysis  of  A(F) . 

The  notion  of  considering  the  estimator  6 as  a statistical  functional 
T(F^),  based  on  the  representation  9 = T(Fg)  expressed  by  (1.1),  is  an 
established  approach  developed  by  von  Mises  [11],  further  developed  by 
Filippova  [53  and  Hampel  [6],  [7],  and  exploited  in  Andrews  et  al.  [13. 

The  novelty  of  the  present  investigation  is  that  the  "functional  approach" 
is  directed  not  toward  the  estijnator  but  rather  toward  the  asynptotic 
variance  of  the  estimator.  In  a simila..'  vein,  we  note  th«.t  Bickel  and 
Lehmann  [2,  p.  10583  have  expressed  interest  in  notions  of  continuity  of  A(F) 
in  regard  to  the  problem  of  ■ 'rJlciency  of  estimators. 

Basically,  we  want  tofi  estimators  of  9 which  are  asymptotically 
efficient  at  ®,  the  standard  noimal,  but  which  also  protect  against  long- 
tailed contamination.  In  Section  2 we  develop  the  basic  techniques  for  ana- 
lyzing, by  the  functional  approach,  the  asymptotic  variance  A(W)  of  an  M-estimator 
as  W ranges  through  an  c -neighborhood  of  the  fom 

(W:  W ■ (1  - c)F^  + cG  , G symmetric}  . 


Here  F is  a fixed  "central"  distribution.  In  the  seqiiel  we  take  to  be 
$ . The  techniques  of  Section  2 are  utilized  in  Section  3 to  shed  new  light 
on  several  known  estimators  and  to  set  up  guidelines  for  building  new  estimators. 
In  Section  4 we  develop  three  new  estimators,  which  are  ccmpared  in  Section  5 
with  the  known  estimators  of  Section  3.  As  brought  out  in  Section  5,  these 
analytical  techniques  for  comparison  of  estimators  not  only  have  intuitive 
appeal  but  also  provide  a cospetitor  to  the  Monte  Carlo  approach.  General- 
izations and  ramifications  are  considered  in  Section  6. 


2.  THE  ASYMPTOTIC  VARIANCE  FUNCTIONAL 

We  consider  M-estimation  for  a location  parameter,  as  described  in 
Section  1.  Related  to  a particular  function  is  an  asyii5)totic  variance 
functional  A(F),  given  by  (1.4),  defined  on  distributicMi  functions  F.  This 
characterization  of  the  asymptotic  variance  parameter  of  the  M- estimator  based 
on  ii«  allows  us  to  draw  upon  the  powerful  functional  approach  introduced  in 
the  statistical  context  by  von  Mises  [11]. 

EEFINITION  2.1.  A functional  T is  m times  differentiable  at  a point 
(distribution)  F with  respect  to  a convex  set  t of  distributions  if  for  each 
WcT  and  for  each  t c [0,  1],  the  derivative 


(2.1) 


-i^TCF  + t(W  - F)] 
dt"* 


exists . 
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Coiresponding  to  this  special  differentiability,  we  have  a Taylor 


ejqpansion. 


THEOPPM-  Let  T(*)  be  m time  differentiable  at  F with  reepeot  to  a set 
T.  Let^  € T.  Then  there  exists  z e (0,  1)  suoh  that 


(2.2) 


T(W)  - T(F)  - I ^TCF  + t(W  - F)]l 


-CtCF  + t(W  - F)]) 


p-1  ,P-  dt^  t = 0 dt®  t = z 


^ In  order  to  apply  this  theorem,  we  compute  the  first  two  derivatives  of 

r 

the  asynptotic  variance  functional  A(F).  In  carrying  out  these  con^tatiois , 
we  assune  that  F is  symnetric  about  0,  that  is  skew- symne trie  (^(-x)  ■ - \|)(X))> 
and  that  certain  differentiations  may  be  taken  under  the  integral  sign.  We  obtain 


A£F  ♦ t(W-  E)]  = — /»  , 

« :U'6F  * tfn>'dC!-T)y 


C2.3) 


2CU'd(W  - F)Kf/dF  * tf/d(W  - F)3 
C/iJ.'dF  + t/i»;'d(W  - F)r 


ACF  + t(W  - F)  ] 
dt^ 


4[/n/-d(W  - F)3C/it,  d(W  - F)3 

[/VdF  ♦ tU'd(W  - F)]^ 


(2.4) 


♦ 6[/»^dF  ■»•  tffy(W  - ]^f»M(W  - F)]^ 
- C/^'dF  ♦ t/<|.'d(W  - F)]^ 


u 


Setting  t ■ 0 in  (2.3),  we  obtain 


(2.5)  ^ A[F  ♦ t(W  - F)]|  T A(F)!-^  ~ 

t • 0 Z*  dP  . C/*  dF] 
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Denote  this  derivative  Ap  Likewise  (2.4)  yields 


^A£F  . t(W  - F)J|^  , . 2ACF)(1  - i ^ 


3[^]2}'v 


(2.6) 


. 4 [1  - 


(2) 

Denote  this  by  Ap  Clearly  we  could  continue  taking  derivatives  and 
obtain  the  following  expression  for  the  asymptotic  variance  of  a distribution 
W: 

(2.7)  A(W)  = A(F)  + Ap^^  + W aJ2)  + 1/6  A^^J,  + • • • . 

Our  interest  lies  in  distributions  W which  are  "close"  to  the  base 
distribution  F.  Suppose  that  we  take  W = (1  - e)F  + cG,  where  G can  be 
viewed  as  a contaminating  distribution.  This  particular  characterization 
of  a contamination  situation  has  been  used  by  Huber  [83  and  others  and 
possesses  desirable  mathematical  properties.  In  this  case 

^F,W  " - c)F  cG  - F))|^  ^ Q 

= g|A[F  + ct(G  - F)]|^  ^ 0 


(2.8) 

^ ^F,G  • 

Similarly, 

^F,W 

2 f21 

= e Q » etc.  Thus 

(2.9) 

A(W) 

■ a(f)  . . 4 4%  • 1 4%  • 

For  small  e, 

F,G  • 


(2.10) 


AOV)  - A(F)  + eA, 


This  exhibits  the  role  of  Ap  It  helps  give  an  indication  of  the  robust- 
ness of  the  corresponding  M-estimator  as  G is  allowed  to  range  over  possible 
contaminating  situations.  Of  course,  in  conparing  different  estimators,  the 
A(F)  contribution  to  (2.10)  will  dominate  when  e is  small.  For  e moderate, 

A-  ^ assets  itself,  so  that  an  estimator  with  a low  asyn^itotic  variance  at 

F,b 

the  base  distribution  F could  be  "beaten"  (in  an  overall  sense)  by  an  esti- 
mator with  higher  A(F)  but  lower  Ap  For  large  e v;e  would  look  at  the  whole 
functional 


(2.11) 


A0V)-A[(1  - e)F  + eG]  = 


(1  - c)U^dF  -*•  efili^dG 
C(1  - c)/iJj'dF  + ef^li'dC:^ 


Note  that  the  expression  (2.14)  follows  also  from  taking  the  Taylor  expansion 
of  AOV)  = A[(l  - c)F  + eG]  viewed  as  a function  of  e. 

We  see  in  (2.9)  how  the  first  term  Ap  is  important  in  approximating 
the  asyn^Jtotic  variance  AGO  • The  quantity  Ap  ^ is  subject  to  another  inter- 
pretation, as  follows.  If  we  put  G = 6.,  a point  mass  at  x,  then  Ap  is 
the  "influence  curve"  (Hampel  [63X7])  of  A(F) . Considered  as  a function  of 
X,  it  shows  what  happens  to  the  estimator  A(F^)  of  A(F)  if  an  additional  obser- 
vation is  "thrown  in"  at  the  point  x.  This  interpretation  is  of  little 
relevance  here,  however,  since  our  objective  is  to  find  T such  that  T(F^)  is 
robust  foi  estimation  of  0 = T(F^)  father  than  such  that  A(F^)  is  robust  for 

A(Fe)  . 

3.  COMPARISON  OF  THE  KNOWN  ESTIMATORS  H15,  25A  and  AMT. 

Huber  [8]  developed  a theory  of  robust  estimation  for  the  location 
parameter  of  a normal  distribution.  In  the  classical  formulation,  the 
observations  have  distribution  4>(x  - 0),  and  an  estimator  which  is  best  in 
the  sense  of  asymptotic  variance  is  the  maximum  likelihood  estimator. 
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This  is  the  M-estimator  corresponding  to  4»(x)  = - 4>"(x)/$'(x)  = x,  or  the 
sanple  mean.  In 'Huber's  formulation,  the  observations  have  distribution 
F(x  - e) , where  F is  assuned  to  belong  to  a class  C of  distributions  "close" 
to  4>,  and  an  M-estimator  is  sought  which  miniiiiaxes  A(F)  in  C . That  is,  the 
"robust"  Il-eStimator  souglit  is  tlie  one  whidi  minmizes  supp^^(F) . In 
particular,  Huber  considers,  for  fixed  choice  of  e,  0 < e < 1,  the  class 

(3.1)  C = {F|F  = (1  - e)4>  + eG,  G synmetric}. 


The  symmetry  restriction  makes  the  location  parameter  0 identifiable.  For 
this  class,  the  minimax  solution  is  given  by  the  M- estimator  corresponding  to 


(3.2) 


1J;(X)  = -li-(  -X) 


X , 0 < X s k 
k , X > k. 


idiere  k is  determined  by  e through 


= |^:(k)-  2«(-k). 


This  estimator  may  be  characterized  as  the  maximum  likelihood  estimator 
for  0 in  the  location  model  based  on  the  distribution  F*eC  with  smallest 
Fisher  information  1(F)  = J(F"/F')  dF.  In  terms  of  functions,  this  robust 
(minimax)  M-estimator  modifies  the  classical  t(/(x)  = x by  truncation  for  |x| 
sufficiently  large.  The  solution  § is  a form  of  WLnsorized  mean. 

Hampel  [6],  [7]  suggested  a modification  designed  to  satisfy  certain 
qualitative  criteria  - low  groBa-error-sensitivity^  high  breakdown  pointy 
small  local-ski ft-senaitivity , and  a not  too  distant  rejection  point. 
Basically,  he  added  to  the  ij)  function  (3.2)  a descending  line  segment, 
producing 


(3.3) 


4>(x)  = - ii/(-  x)  = 


X , 0 s X < a, 
a > a < X < b, 
a(~^.  b s X < c. 


, X > c. 
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This  estimator  has  the  advantage  ■'•f  completely  rejecting  outliers  while 
giving  up  very  little  efficiency  (compared  to  the  "Hubers")  at  the  normal. 

In  the  Princeton  Monte  Carlo  study  (Andrews  et  al.  Cl])  is  found  a 
smoothed  version  of  the  "Hairpel"  (3.3),  given  by 


(3.4) 


if/(x)  = - i//(-  x)  = 


sin  ax  , 0 s x < ^ 

a 


X > ^ 

a 


Several  estiriiators  of  the  Hiiber  type  (3.2),  and  one  of  the  Hansel  type 
(3.3),  were  entered  in  the  Princeton  study  and  found  to  be  quite  "robust" 
in  a broad  sense.  In  general,  the  "descenders"  tended  to  outperform  the 
Hubers  vdienever  the  contaminating  distributions  were  long- tailed  and  the 
amount  of  contamination  (the  value  of  e)  at  least  moderate.  We  shall 
ccmtpare  three  specific  estimators:  H15,  the  Huber  with  k = 1.5;  25A,  the 
Hampel  with  a = 1.69,  b = 3.04,  c = 6.4;  and  AMT,  the  sine  function  with 
a = .7062.  In  Figure  A we  have  plotted  the  "influence  curves"  of  these  three 
estimators.  For  a functional  T defined  by  (1.1),  and  differentiable  according 
to  Definition  2.1,  the  influence  airve  is  the  derivative  evaluated  at  G =6^^  : 

(3.S)  ICf_p(x)  . ^TCF  . t(^  - F)]|,  . 0 - . 

SO  tliat  the  influence  curve  of  an  M-estimator  is  conveniently  just  the 
function  times  a coefficient  of  proportionality.  (The  general  interpretation 
of  an  influence  curve  was  mentioned  at  the  end  of  Section  2.) 

(In  the  Princeton  study,  scale  and  location  parameters  were  estimated 
simultaneously,  whereas  here  we  are  considering  for  simplicity  the  location 
problem  with  scale  assumed  known  ana  set  equal  to,l.  The  use  of  the  median 
deviation  as  a scale  estimate  made  25A  and  AMI’  have  slightly  different  forms 
irr  the  Princeton  study.) 
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We  now  center  attention  on  the  neighborhood  of  4>  defined  by  (3.1).  One 
choice  for  F = (1  - e)4>  + «G  is  given  by  G = where  6j^j  places  probability 

% each  at  x and  -x.  This  distribution  is  somewhat  extreme  but  helps  us  under- 
stand what  happens  to  the  asymptotic  variance  if  contamination  consists  of  an 
observation  thrown  in  randomly  at  x or  -x.  Using  the  results  of  Section  2,  we 


(3.6) 


A(F)  = A(-t)  + g 


where 


(3.7) 


= A($)(l  - 


2/«i* ^'^Ixl 


[/«l)'d$]^ 


■ -'wd  - jf^ 


. 2ij)"(x)i  JlOs) 

versus  x for  each  estimator.  It  is  seen  that  H15 


In  Figure  B we  plot  A*  . versus  x for  each  estimator.  It  is  seen  that  i 
is  superior  for  x s 4,  but  AIIT  and  25A  cane  into  their  own  for  x > 4.  For 


X > 4.46  and  x > 6.4,  the  A. 


values  for  AffT  have  dropped  to  1.042  and 


1.026,  their  respective  asymptotic  variances  at  $,  whereas  the  value  of 

A^  - for  H15  remains  constant  at  4.03.  The  minimax  properties 

of  HIS,  as  well  as  the  pitfalls  of  such  a principle,  are  well  illustrated 

by  Figure  B.  , 

Another  choice  for  the  contaminating  distribution  is  N^,  normal  with 
2 

mean  0 and  variance  x . This  contamination  model  was  used  in  the  Princeton 
study  for  x = 3 and  x = 10,  and  in  terms  of  A^  ^ curves  is  fairly  repre- 
sentative of  other  diffuse  contaminations  (for  exanple,  the  Laplace  distribution). 
Now,  for  F = (1  - c)*  + eN^  , we  have 


(3.7) 


where 


A(F)  = A(4>)  + €A, 


2frdN  U^dN 


J 
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Notice  that  the  value  of  A.  for  a particular  x is  just  the  integral  of 

* X 

, with  respect  to  the  measure  N^.  The  plots  of  A.  in  Figure  C speak 
*'^1x1  ^ 

even  more  highly  in  favor  of  the  descenders.  It  is  seen  that  H15  is  clearly 
outperformed  for  x > 2.5  and  is  ccmpetitive  only  for  x < 2.5. 

Although  the  Hubers  achieve  optimality  in  a certain  (minimax)  sense  of 
robustness,  it  was  discovered  empirically  in  the  Princeton  study  that  at 
"typical"  cases  of  contaminating  distributions  the  Hubers  are  actually  out- 
performed by  certain  other  estimators,  particularly  "descenders."  Figure  B 
and  C show  analytically  the  basis  of  this  phenomenon.  For  the  Hubers,  A(F) 
tends  to  remain  at  levels  near  supp^^  A(F)  over  a wide  range  of  FeC;  the 
opposite  is  the  case  for  25A  and  Af'fT. 

These  A^  q plots  not  only  help  us  ccnnpare  old  estimators,  but  also  allow 
us  to  set  up  criteria  for  building  new  estimators.  As  a first  criterion,  it 
seems  reasonable  to  require  all  robust  M-estimators  to  have  descending  iji 


functions.  Secondly,  the  peak  value  of  Ap  as  G varies,  should  be  kept  low 
while  maintaining  efficiency  at  the  base  distribution.  Thirdly,  the  comparison 
of  25A  and  AMT  in  Figures  A and  B suggests  that  smoothing  at  the  bends  of  the 
25A  ill  function  should  improve  the  quality  of  the  estimator.  Accordingly, 
thinking  of  G = 6|^j  as  the  "worst"  kind  of  distribution  ^in  the  sense  of 
producing  irregularity),  we  formulate  a modified  minimax  objective:  find  an 

estimator  with  a smooth  and  descending  ip  function  having  relatively  low  asymp- 
totic variance  at  the  b£ise  distribution  and  having  relatively  minimal  peak 
value  of  A_  - In  Section  4 we  develop  some  new  estimators  which  tend  to 

fulfill  this  objective. 
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4.  DEVELOPMENT  OF  NEW  ESTH4ATORS  H15D,  25AR  AND  QC45 


We  have  seen  in  Section  3 that  H15  needs  a descending  tail  and  25A  a 
smoother  descending  tail.  We  can  produce  sane  new  estimators  by  modifying 
these  old  estimators.  We  introduce  HISD  with  function 


(4.1)  i|)(x)  = - «l;(-x) 


X , 0 < X < 1.5, 

1.5  , 1.5  < X £ 2, 

1.5  sin  (.5236x  + .5236),  2 < x s 5, 
0 , X > 5, 


and  25AR  with  function 


(4.2)  i|;(x)  = - ij/(-x) 


X , 0 < X < 1.69, 

1.69  , 1.69  < X s 3.04, 

1.69  sin  (.4675X  * .1596),  3.04  s x < 6.4, 
0 , X > 6.4. 


The  choice  of  a sine  function  for  the  descending  part  is  rather  arbitrary,  as 

many  other  similar  functions  would  suffice.  One  might  think  it  inqxjrtant  to 

smooth  the  first  bend  of  the  ^ function,  but  we  have  explored  this  and  found. 

that  it  does  not  help  in  minimizing  At-  . and  definitely  increases  the  asymp 

* |x| 

totic  variance  at  *. 

Vfe  now  exhibit  a totally  new  estimator.  In  lieu  of  sine  functions,  we 
consider  a fairly  simple  one  parameter  family  of  quadratic  functions  given  by 


(4.3) 


Hx)  = - iK-x)  = 


0 s X s c, 
X > c. 


By  computing  the  asymptotic  variance  and  A.  , curves  for  several  values  of 
c,  we  find  a range  of  c values  yielding  robust  but  fairly  efficient  M-estimators. 
It  is  interesting  to  note  that  as  c increases  the  asymptotic  variance  at  4> 


decreases,  but  the  peak  of  tlie  A 


41,6  I 


curve  is  raised.  This  is  not  surprising 


since  as  c -►  « this  estimator  approaches  )T,  but  it  does  point  out  the  cdmpromise 
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one  always  must  make  between  efficiency  at  the  normal  cind  high  levels 


of  protection  against  non-normality.  In  plotting  L . curves,  one 

’ |x| 

defect  in  the  above  quadratic  family  was  discovered.  A second  peak 


in  the  . curve  is  caused  by  the  ({»  function  of  the  estimator 


descending  to  0 too  rapidly;  more  precisely,  the  contribution  to  (3.6) 


from  the  quantity  \\i" (x)/ j^'d9  is  too  large.  Hence,  instead  of  (4.3), 


we  adopt  the  family  of  iij  functions 


(4.4) 


ij;(x)  = - ilj(-x) 


cx-x  , Osxsd, 
-e(x  - f)  , d < X s f, 
0 . X 2 f, 


where  d,  e and  f are  chosen  so  that  the  influence  curve  of  \i>,  \p(x)/fi{/'di, 


has  s Ion  • i for  x ^ d and  is  continuous.  In  particular,  we  take 


c = 4.5,  which  makes  d = 3.7,  e = 2.9  and  f = 4.72,  and  we  call  the 


corresponding  M-estimator  QC45.  Its  ti>  function  is  thus 


(4.5) 


ip(x)  = - i|<(-x) 


4.5x  - X , 0 s X s 3.7, 

= (-2.9(x  - 4.72),  3.7  s X s 4.72, 
0 t X > 4.72. 


A plot  of  the  influence  curves  of  H15D,  25AR  and  QC45  is  presented 


in  Figure  D.  In  Section  5 we  shov/  how  tliese  three  estimators  compare 


with  those  examined  in  Section  3. 


I 


NOTE:  IC(x)  = ,|;(x)//i|;' (x)d*(x) 
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5.  COf^ARISON  OF  OLD  AND  NEIV  ESTDIATORS 

Three  new  estimators  were  developed  in  Section  4 within  the  guidelines 

set  forth  in  Section  3.  Here  we  investigate  whether  these  new  estimators  are 

actually  iin)rovements  over  the  old  ones.  Figures  E and  F give  the  basic 

comparisons  among  all  six  estimators:  H15,  25A,  AT-fT,  H15D,  25AR  and 

QC45.  In  Figure  E the  , curves  are  displayed.  We  see  tliat  H15D  has 

4,5|xj 

the  lowest  peak  value  and  easily  beats  H15  for  x > 3.4.  Also,  25AR  has 
smoothed  out  the  large  spike  in  25A  at  x = 3.04,  but  in  doing  so  it  remains 
higher  than  25A  for  6.4  < x 5 3.5.  The  estimator  QC45  is  similar  to  A^^^ 
in  shape  but  is  better  in  the  range  2 s x < 4.5.  Turning  to  Figure  F, 
v4iich  displays  the  A^  ^ curves,  we  can  hardly  distinguish  between  H15D, 

A'ff  and  QC45,  but  25A  uniformly  beats  25AR  for  x > 2,  and  HI 5,  the  only 
non-descender,  is  clearly  outmatched  by  all  the  descenders. 

Thus  far  we  have  examined  only  the  A.  „ curves  of  these  estimators. 

This  affords  an  effective  approximate  analysis  of  the  asymptotic  variances. 

But  what  about  the  exact  asymptotic  variance  in  an  e -neighborhood  of  4? 

Figures  G,  H and  I plot  the  actual  asymptotic  variances  for  F = (1  - e)4>  + eN^, 
based  on  expression  (2.11),  for  e = .01  , .1  and  .25.  Note  that  the  scale 

of  the  ordinate  is  different  in  each  figure  in  order  to  accentuate  the 
relative  standing  of  the  estimators.  For  e = .01,  the  curves  merely  reflect 
the  orderings  of  A(<i>)  except  for  the  general  poor  behavior  of  H15.  In 
Figure  H,  AMT  is  a sure  winner  with  H15D  not  far  behind.  Note  that  25AR 
makes  a complete  switch  from  one  of  the  best  at  c = .01  to  the  worst 
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(of  the  descenders)  at  e = .1.  The  same  pattern  holds  for  e = .25  in 
Figure  I:  AIIT,  H15D  and  QC45  are  still  close  together,  but  all  three 

are  clearly  better  than  25A,  and  25AR  pulls  further  away  from  the  others. 
In  general,  these  asymptotic  variance  graphs  reinforce  the  patteiBS 
already  seen  in  Figure  F.  This  strengthens  our  faith  in  usihg  ^ 
plots  for  the  purpose  of  examining  asymptotic  variance.  Such  plots  allow 
us  to  sunnarize  in  a single  plot  the  information  contained  in  a number  of 
specific  e -neighborhood  situations. 

In  Table  1 we  give  the  asymptotic  variances  for  four  specific 
distributions:  the  standard  normal,  the  Laplace  (double  exponential) 

with  scale  parameter  1,  the  t-distribution  with  3 degrees  of  freedan,  and 
the  Cauchy  distribution  with  scale  parameter  1.  Note  that  QC45  has 
the  largest  asymptotic  variance  at  o but  the  smallest  for  the  other 
distributions.  The  two  best  at  4>,  25A  and  25AR,  lose  out  quickly 

for  the  three  longer- tailed  distributions.  Essentially  the  same  pattern 
emerges  as  was  seen  in  Figures  F - I:  as  one  moves  from  to  longer- 

tailed  distributions,  AMT,  H15D  and  QC45  consistently  outperform  HIS, 
25A  and  2SAR.  This  result  is  not  su .prising  when  one  notices  that 
AlfT,  H15D  and  QC45  satisfy  fairly  well  the  modified  minimax  objectives 
proposed  at  the  conclusion  of  Section  3. 

Also  included  in  Table  1 are  values  for  some  specific  c -contaminated 
normal  situations.  The  same  patterns  persist.  This  part  of  the  table  is 
included  primarily  to  coordinate  with  the  Princeton  study.  In  fact, 
one  may  actually  compare  the  values  for  AMT,  HI 5 and  25A  with  those 
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given  in  Table  5-7A,  5-7B  and  5-7E  of  the  Princeton  study.  The 
numbers  are  fairly  close  idien  one  considers  that  our  values  are  exact 
asynptotic  values,  whereas  the  Princeton  study  gives  Itonte  Carlo 
results  for  n = 20  and  also  involves  simultaneous  scale  estimation. 

This  illustrates  how  our  techniques  for  comparison  of  estimators  offer 
a cost-saving  alternative  to  the  ffonte  Carlo  approach. 

Typically,  robust  M-estimators  must  be  found  iteratively.  Hence, 
from  con^utational  considerations,  one  might  want  to  avoid  sine 
functions  or  other  functions  which  require  Taylor  expansions  inside 
the  computer.  An  estimator  with  a siirple  ii)  function  like  that  of 
QC45  may  be  preferable  to  an  estimator  which  barely  outperforms  it  but 
is  computationally  worse. 
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3 

1.  Asyirptotic  Variances  of  Selected  Estimators  at  Selected  Distributions 


Distribution 

Estimator 

AMT 

QC45 

H15 

H15D 

25A 

2SAR 

1.042 

1.054 

1.037 

1.045 

1.025 

1.024 

Laplace 

1.448 

1.385 

1.465 

1.473 

1.512 

1.515 

1.562 

1.542 

1.582 

1.561 

1.585 

1.592 

Cauchy 

2.401 

2.325 

2.985 

2.417 

2.559 

2.615 

W(.01,  5:t 

1.062 

1.074 

1.061 

1.065 

1.047 

1.047 

W(.025,,  3) 

1.094 

1.105 

1.097 

1.097 

1.082 

1.084 

W(.05,  3) 

1.150 

1.160 

1.160 

1.152 

1.142 

1.147 

W(.l,  3) 

1.273 

1.282 

1.295 

1.274 

1.274 

1.284 

W(.25,  3) 

1.762 

1.762 

1.800 

1.755 

1.787 

1.813 

W(.05,  10) 

1.127 

1.140 

1.227 

1.131 

1.122 

1.128 

W(.l,  10) 

1.226 

1.238 

1.448 

1.231 

1.234 

1.249 

W(.25,  10) 

1.633 

1.648 

2.385 

1.644 

1.706 

1.762 

^Discussed  in  Sec.  5. 

NOTE:  The  entries  are  the  variances  in  the  asymptotic  normal  distributions 

of  the  normalized  estimators,  \/hen  the  observations  haye  the  specified 

distributions.  The  distributions  arc  as  follows:  4>  is  standard 

normal;  Laplace  denotes  the  density  f(x)  = *sexp(- |x|) ; t^  denotes  the 

2 2 

density  f(x)  = 2/»^-(l  + x /3)  ; Cauchy  denotes  the  density 

f(x)  = 1A(1  + x^);  W(e,  c)  denotes  the  c.d.f.  (1  - c)il>(x)  e^Cx/c). 


6.  GENERALIZATIONS  AND  RAMIFICATIONS 


We  have  attenpted  to  present  the  functional  techniques  of  Section  2 and 
their  applications  in  the  sinplest  situation:  M-estimation  in  the  normal 
location  problem.  Generalizations  in  several  directions  are  apparent. 

For  example,  the  asymptotic  variance  functionals  for  other  types  of 

estimators  such  as  L*estimators  (linear  conrhinations  of  order  statistics)  or 

R-estimators  (rank  statistics)  may  be  investigated.  The  techniques  would  be 

the  same:  find  the  von  Mises  derivative,  which  we  denoted  Ap  p,  and  examine 

0* 

it  for  different  choices  of  e -neighborhoods,  F = (1  - e)F^  + eG. 

Further,  it  is  not  necessary  that  F^  = $,  although  this  is  the  most 
coinnon  base  distribution.  One  could  choose  F^  to  be  the  Laplace  or  some  other 
theoretically  reasonable  base  distribution.  It  is  not  even  necessary  that  F^ 
or  G be  assumed  synmetric  as  long  as  the  asymptotic  variance  functional  is 
well-defined.  The  technical  problem  of  ^diat  exactly  is  being  estimated  in 
these  asymmetric  cases  is  discussed  by  Bickel  and  Lehmann  [3].  Indeed,  Collins 
[4]  derives  optimal  functions  for  such  cases,  and  we  note  that  one  variety 
of  solution  is  a "descender." 

A third  type  of  extension  involves  simultaneous  estimation  of  location 
and  scale  parameters.  In  these  cases  the  expression  for'  Ap  ^ is  similar  to 
that  derived  in  Section  2,  but  with  an  extra  term  due  to  estimation  of  the 
scale  parameter.  Otherwise  the  analysis  follows  as  before.  Note  that  the 
actual  use  of  an  M-estimator  of  location  on  real  data  requires  simultaneous 
scale  estimation. 

Clearly  the  complexity  of  a problem  is  increased  by  the  inclusion  of  one 
or  more  of  these  generalizations.  Nevertheless  the  techniques  are  straight- 
forward and  may  be  applied  to  a large  class  of  problems. 
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